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Abstract 

We introduce a method that uses the Cauchy-Crofton formula and a 
new curvature formula from integral geometry to reweight the sampling 
probabilities of Metropolis-within-Gibbs algorithms in order to increase 
their convergence speed. We consider algorithms that sample from a prob¬ 
ability density conditioned on a manifold M. Our method exploits the 
symmetries of the algorithms’ isotropic random search-direction subspaces 
to analytically average out the variance in the intersection volume caused 
by the orientation of the search-subspace with respect to the manifold M 
it intersects. This variance can grow exponentially with the dimension 
of the search-subspace, greatly slowing down the algorithm. Eliminating 
this variance allows us to use search-subspaces of dimensions many times 
greater than would otherwise be possible, allowing us to sample very rare 
events that a lower-dimensional search-subspace would be unlikely to in¬ 
tersect. 

To extend this method to events that are rare for reasons other than 
their support M having a lower dimension, we formulate and prove a new 
theorem in integral geometry that makes use of the curvature form of the 
Chern-Gauss-Bonnet theorem to reweight sampling probabilities. On the 
side, we also apply our theorem to obtain new theoretical bounds for the 
volumes of real algebraic manifolds. 

Finally, we demonstrate the computational effectiveness and speedup 
of our method by numerically applying it to the conditional stochastic 
Airy operator sampling problem in random matrix theory. 

Keywords: integral and stochastic geometry, Metropolis-within-Gibbs algorithms, mani¬ 
fold MCMC, Chern-Gauss-Bonnet theorem, Cauchy-Crofton formula, random matrices, real 
algebraic manifold volume bounds 
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1 Introduction 


Applications of sampling on probability distributions, defined on Euclidean 
space or on other manifolds, arise in many fields, such as Statistics 1130 HU, 
Machine Learning [5] , Statistical Mechanics |32l, General Relativity (U, Molec¬ 
ular Biology [U, Linguistics [H], and Genetics HU. One application of special 
interest to us is random matrix theory, where we would like to compute statis¬ 
tics for the eigenvalues of random matrices under certain eigenvalue constraints. 
In many cases these probability distributions are difficult to sample from with 
straightforward methods such as rejection sampling because the events we are 
conditioning on are very rare, or the probability density concentrates in some 
small regions of space. Typically, the complexity of sampling from these distri¬ 
butions grows exponentially with the dimension of the space. In such situations, 
we require alternative sampling methods whose complexity promises not to grow 
exponentially with dimension. In Markov chain Monte Garlo (MGMG) algo¬ 
rithms, one of the most commonly used such methods, we run a Markov chain 
over the manifold that converges to the desired probability distribution |24| . 
Unfortunately, in many situations MGMG algorithms still suffer from inefficien¬ 
cies that cause the Markov chain to have very long (oftentimes exponentially 
long) convergence times fBH [2H1IHII] . 

To illustrate these inefficiencies and our proposed fix, we imagine we would 
like to sample uniformly from a manifold A4 C (as illustrated in dark blue 

in Figure 1.) By uniformly, we can imagine that A4 has finite volume, and the 
probability of being picked in a region is equal to the volume of that region. 
More generally, we can put a probability measure on JH and sample from that 
measure. 

We consider algorithms that produce a sequence of points {xi,X 2 , ■ ■ ■} (yel¬ 
low dots in Figure 1) with the property that Xi+i will be chosen somehow in 
an (isotropically generated) random plane S (red plane in Figure 1) centered at 
Xi- Further, the step from Xi to Xi+i is independent of all the previous steps 
(Markov chain property.) This situation is known as a Gibbs sampling Markov 
chain with isotropic random search-subspaces. 

For our purposes, we find it helpful to pick a sphere (light blue) of radius 
r that represents the length of the jump we might wish to take upon stepping 
from Xi to Xi^i- Note that r is usually random. The sphere will be the natural 
setting to mathematically exploit the symmetries associated with isotropically 
distributed planes. Intersecting with the sphere, the plane S becomes a great 
circle S (red), and the manifold becomes a submanifold (blue) of the sphere. 
Assuming we take a step length of r, then necessarily Xi+i must be on the 
intersection (green dots in Figure 1, higher-dimensional submanifolds in more 
general situations) of the red great circle and the blue submanifold. 

For definitiveness, suppose our ambient space is where n = 2, our blue 
manifold Ai has codimension fc = 1, and our search-subspaces have dimension 
fc -|- 1. Our sphere now has dimension n and the great circle dimension fc = 1. 
The intersections (green dots) of the great circle with M. are 0-dimensional 
points. 
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We now turn to the specifics of how Xi+i may be chosen from the intersection 
of the red curve and the blue curve. Every green point is on the intersection 
of the blue manifold and the red circle. It is worth pondering the distinction 
between shallower angles of intersection, and steeper angles. If we thicken the 
circle by a small constant thickness e, we see that a point with a shallow angle 
has a larger intersection than a steep angle. Therefore points with shallow 
angles should be weighted more. Figureillustrates that is the proper 

weighting for an intersection angle of 0^. 

We will argue that the distinction between shallower and steeper angles 
takes on a false sense of importance and traditional algorithms may become 
unnecessarily inefficient accordingly. A traditional algorithm focuses on the 
specific red circle that happens to be generated by the algorithm and then 
gives more weight to intersection points with shallower angles. We propose that 
knowledge of the isotropic distribution of the red circle indicates that all angles 
may be given the same weight. Therefore, any algorithmic work that goes into 
weighting points unequally based on th e ang le of intersection is wasted work. 

Specifically, as we will see in Section 


3.2 


sin(0i) 


has infinite variance, due in 


part to the fact that ^ can become arbitrarily large for small enough 9i. The 
algorithm must therefore search through a large fraction of the (green) intersec¬ 
tion points before converging because any one point could contain a signifiant 
portion of the conditional probability density, provided that its intersection an¬ 
gle is small enough. This causes the algorithm to sample the intersection points 
very slowly in situations where the dimension is large and there are typically 
exponentially many possible intersection points to sample from. 

This paper justifies the validity of the angle-independent approach through 
the mathematics of integral geometry ISD EH ini [HI [SI, and the Cauchy- 
Crofton formula in particular in Section We should note that sampling all 
the intersection points with equal probability cannot work for just any choice of 
random search-subspace S. For instance, if the search-subspaces are chosen to 
be random longitudes on the 2-sphere, parts of AI that have a nearly east-west 
orientation would be sampled frequently but parts of M. that have nearly north- 
south orientation would be almost never sampled, introducing a statistical bias 
to the samples in favor of the east-west oriented samples. However, if S is chosen 
to be isotropically random, the random orientation of S does not favor either 
the north-south nor the east-west parts of M., suggesting that we can sample the 
intersection points with equal probability in this situation without introducing 
a bias. Effectively, by sampling with equal probability weights and isotropic 
search-subspaces we will use integral geometry to compute an analytical average 
of the weights, an average that we would otherwise compute numerically, thereby 
freeing up computational resources and speeding up the algorithm. 

In Part II of this paper, we perform a numerical implementation of an ap¬ 
proximate version of the above algorithm in order to sample the eigenvalues of 
a random matrix conditioned on certain rare events involving other eigenvalues 
of this matrix. We obtain different histograms from these samples weighted 
according to both the traditional weights as well as integral geometry weights 
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Figure 1: In this example we wish to generate random samples on a codimension- 
k manifold M. C K" (dark blue) with a Metropolis-within-Gibbs Markov chain 
{xi,X 2 , ■ ■ ■} that uses isotropic random search-subspaces S (light red) centered 
at the most recent point Xi {k = l,n = 3in figure). We will consider the sphere 
rS" of an arbitrary radius r centered at Xi (light blue), allowing us to make use 
of the spherical symmetry in the distribution of the random search-subspace 
to improve the algorithm’s convergence speed. S now becomes an isotropically 
distributed random great fc-sphere S = S D rS" (dark red), that intersects a 
codimension-fc submanifold A4 = A4 Ci rS" of the great sphere. 


(Figure]^ Figures and 10 in part II). We find that using integral geometry 
greatly reduces the variance of the weights. For instance, the integral geometry 
weights normalized by the median weight had a sample variance of 3.6 x 10^, 578, 
and 1879 times smaller than the traditional weights, respectively, for the top, 
middle, and bottom simulations of Figure This reduction in variance allows 
us to get faster-converging (i.e., smoother for the same number of data points) 
and more accurate histograms in Figure In fact. Section shows that the 
traditional weights have infinite variance due to their second-order heavy tailed 
probability density, so the sample variance tends to increase greatly as more 
samples are taken. Because of the second-order heavy-tailed behavior in the 
weights, the smoother we desire the histogram to be, the greater the speed up 
in the convergence time obtained by using the integral geometry weights in place 
of the traditional weights. 


Remark 1. Since we are using an approximate truncated version of the full al¬ 
gorithm that is not completely asymptotically accurate, the integral geometry 
weights also cause an increase in asymptotic accuracy. The full MCMC algo- 
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Figure 2: Conditional on the next sample point Xi^i lying a distance r from 
Xi, the algorithm must randomly choose Xi+i from a probability distribution 
on the intersection points (middle, green) of the manifold JH with the isotropic 
random great circle S (red). If traditional Gibbs sampling is used, intersection 
points with a very small angle of intersection Oi must be sampled with a much 
greater (unnormalized) probability ^ (right, top) than intersection points 
with a large angle (right, bottom). This greatly increases the variance in the 
sampling probabilities for different points and slows down the convergence of 
the method used to generate the next sample Xi^i. However, since S is isotrop¬ 
ically distributed on rS", the symmetry of the isotropic distribution of S allows 
us to use the Cauchy-Crofton formula from integral geometry to analytically 
average out these unequal probability weights so that every intersection point 
now has the same weight, freeing the algorithm from the time-consuming task 
of effectively computing this average numerically. 


rithm should have perfect asymptotic accuracy, so we expect this increase in 
accuracy to become an increase in convergence speed if we allow the Markov 
chain to mix for a longer amount of time. 

For situations where the intersections are higher-dimensional submanifolds 
rather than individual points, we show in Section ^ that the angle-independent 
approach generalizes to a curvature-dependent approach. We stress that tradi¬ 
tional algorithms condition only on the plane that was actually generated while 
ignoring its isotropic distribution. By taking the isotropy into account, our 
algorithm can use the curvature information of the manifold to compute an an¬ 
alytical average of the local intersection volumes (local in a second-order sense) 
with all possible isotropically distributed search-subspaces, greatly reducing the 
variance of the volumes. 

Higher-dimensional intersections occur in many (perhaps most) situations, 
such as applications with events that are rare for reasons other than that their 
associated submanifold has high codimension. In these situations, the probabil- 
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Figure 3: Histograms from 3 random matrix simulations (see Sections and 
where we seek the distribution of an eigenvalue given conditions on one or 
more other eigenvalues. In all three figures, the blue curve uses the integral 
geometry weights proposed in this paper, the red curve uses traditional weights, 
and the black curve (only in the top two figures) is obtained by the accurate 
but very slow rejection sampling method. Two things worth noticing is that 
the integral geometry weight curve is more accurate than the traditional weight 
curve (at least when we have a rejection sampling curve to compare), and that 
the integral geometry weight curve is smoother than the traditional weight curve. 
The integral geometry algorithm achieves these benefits in part because of the 
much smaller variance in the weights. (In these three cases the integral geometry 
sample variance is smaller by a factor of « 10®, 600, and 2000 respectively) 
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Figure 4: In this example a collection Ai of n — 1-dimensional spheres (blue, 
left) is intersected (intersection depicted as green circles) by a random search- 
subspace S (red). The spheres that S intersects farther form their center will 
have a much smaller intersection volume than the spheres that S intersects 
closer to their center, with the variance in the intersection volumes increasing 
exponentially in the dimension d of S (logarithmic plot, right). This curse of 
dimensionality for the intersection volume can lead to an exponential slowdown 
when using a traditional algorithm to sample from S' H A1. In Section we 
will see that this slowdown can be avoided if we use the curvature information 
to reweight the intersection volumes, reducing the variance in the intersection 
volumes. 


ity of a low-dimensional search-subspace intersecting Ai can be very small, so 
one may wish to use a search-subspace S of dimension d that is greater than 
the codimension fc of in order to increase the probability of intersecting A4. 

As we will see in Section |4.7| the traditional approach can lead to a huge 
variance in the intersection volumes that increases exponentially with the dif¬ 
ference in dimension d — k (Figure]^ right). This exponentially large variance 
leads to the same type of algorithmic slowdowns of the traditional algorithm 
as the variance in the traditional angle weights discussed above. Using the 
curvature-aware approach can oftentimes reduce or eliminate this exponential 
slowdown. 

This paper justifies the validity of the curvature-aware approach by proving 
a generalization of the Cauchy-Crofton formula (Section . We then motivate 
the use of the curvature-aware approach over the traditional curvature-oblivious 
approach using the mathematics of concentration of measure |22l 1^ |53] (Sec- 

31] , specifically the Chern-Gauss-Bonnet 


tion 


4.7) and differential geometry 


Theorem [6| whose curvature form we use to re-weight the intersection volumes 
(Section |4.4|). 
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Part I 

Theoretical results and discussion 


2 Integral & differential geometry preliminaries 

2.1 Kinematic measure 

Up to this point in the paper we have talked about random search-subspaces 
informally. This notion of randomness is formally referred to as the kinematic 
measure |81l 1^ . The kinematic measure provides the right setting to state 
the Cauchy-Crofton Formula. The kinematic measure, as the name suggests, is 
invariant under translations and rotations. 

The kinematic measure is the formal way of discussing the following simple 
situation: we would like to take a random point p uniformly on the unit sphere 
or, say, inside a cube in M”. First we consider the sphere. After choosing p 
we then choose an isotropically random plane of dimension d -|- 1 through the 
point p and the center of the sphere. In the case of the sphere, this is simply 
an isotropic random plane through the center of the sphere. On a cube there 
are some technical issues, but the basic idea of choosing a random point and an 
isotropic random orientation using that point as the origin persists. On the cube 
we would allow any orientation not only those through a "center". The technical 
issues relate to the boundary effects of a finite cube or the lack of a concept of 
a uniform probability measure on an infinite space. In any case the spherical 
geometry is the natural computational setting because it is compact (If we insist 
on artificially compactifying K" or H” by conditioning on a compact subset then 
either the boundary effects cause the different search-subspaces to vary greatly 
in volume, slowing the algorithm, or we must restrict ourselves to such a large 
subset of K” or H" that most of the search-subspaces don’t pass through much 
of the region of interest). However, for the sake of completeness we introduce the 
kinematic measure for all three constant-curvature spaces (spherical. Euclidean, 
and hyperbolic) because it is relevant in more theoretical applications. 

In the spherical geometry case, we define the kinematic measure with respect 
to a fixed non-random subset S'^xed C S", usually a great subsphere, by the 
action of the Haar measure on the special orthogonal group SO(n) on S'gxed- 
When generalizing to Euclidean and hyperbolic geometry, we must be a bit 
more careful, because there is no uniform probability distribution on R" or H”. 
In the case where S has finite d-volume, we can circumvent these issues simply 
by choosing p to be a point in the poisson point process. To generalize to planes 
and hyperboloids, we may define the kinematic measure as a poisson-like point 
process for our search-subspaces with a translationally and rotationally invariant 
distribution on all of K" (or H") (the "points" here are the search-subspaces): 


Definition 1. (Kinematic measure) 

Let K" G {S",K",H"} be a constant-curvature space. Let S'gxed be a d- 
dimensional manifold that either has a finite d-volume, or is a plane (in ffi” 
only) or a hyperboloid (in H" only). Let H be the Haar measure on G. If .S' has 
finite d-volume we take G to be the group /„ of isometries of K”. If 5” is a plane 
or hyperboloid, we instead take G to be the quotient In/Id of the isometries on 
K” with the isometries on S'flxed- Let N be the counting process such that 

« n^A)] = X HiA) 

(ii) N{A) and N{B) are independent 

for any disjoint Haar-measurable subsets A, B C G, where we drop the ^ 

term if S^xed is a plane or hyperboloid. We define the kinematic measure with 
respect to S'gxed C K” to be the action of the elements of N on ^gxed- 

If we wish to actually sample from the kinematic measure for the infinite- 
measure spaces M" or H" in real life, we must restrict ourselves to some (almost 
surely) finite subset of the infinite kinematic measure point process. For in¬ 
stance, in this paper we would restrict ourselves to those subspaces that intersect 
some manifold Ai that we would like to sample. 

2.2 The Cauchy-Crofton formula 

In this section, we state the Cauchy-Crofton formula [i [SD [sa, which says 
that the volume of a manifold Ai is proportional to the average of the volumes 
of the intersection S' n of Ai with a random kinematic measure-distributed 
search-subspace S. Our first-order reweigthing (section]^, referred to as the 
"angle-independent" reweighting in the introduction, is based on this formula. 
In Section we will prove a generalization of this formula that will allow for 
higher-order reweightings. 

Lemma 1. (Cauchy-Crofton FormulalJ ^ \S1\ YWj 

Let Ai be a codimension-k submanifold o/K", where K" G {S", K", H"}. 
Let S be a random d-dimensional manifold in K" of finite volume (or a plane or 
hyperboloid), distributed aceording to the Kinematic measure. Then there exists 
a constant Cd,k,n,K such that 

Vo\n-k{Ai) = X Es[Yo\d-k{S n M)], (1) 

where we set Voh(S) to 1 if S is a plane or hyperboloid. In the spherieal case 
we have Cd,k,n,s = ^ ■ Cd.fe.n.R and Cd.k.nM are given in 

2.3 The Chern-Gauss-Bonnet theorem 

The Gauss-Bonnet theorem [33], states that the integral of the Gaussian curva¬ 
ture C of a 2-dimensional manifold Ai is proportional to its Euler characteristic 
xiM): 
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CdA = 2Trx(M). 


( 2 ) 


IM 


The Chern-Gauss-Bonnet theorem, a generalization of the Gauss-Bonnet 
theorem to arbitrary even-m-dimensional manifolds states that 


/ Pf(f])dVol„ = (27T)^xiM), (3) 

Jm 

where is the curvature form of the Levi-Givita connection and Pf is the pfaf- 
fian. The curvature form is an intrinsic property of the manifold, i.e., it does 
not depend on the embedding. In the special case when is a hypersurface, 
the curvature Pf(n 2 ,) may be computed as the Jacobian determinant of the 
Gauss map at x iMinn], i.e., as the determinant of the Hessian at x of the 
manifold when the orthogonal distance of the manifold to the tangent plane at 
X is expressed as a function of the tangent space. 

The Ghern-Gauss-Bonnet theorem is usually viewed as a way of relating 
the curvature of the manifold with its Euler characteristic. In Section |4] we will 
interpret the Ghern-Gauss Bonnet theorem as a way of relating the volume form 
dVolm to the curvature form fl. This will come in useful since the curvature 
form does not change very quickly in sufficiently smooth manifolds, allowing us 
to get an (in many cases order-of-magnitude) estimate for the volume of the 
manifold from its curvature form at a single point. 


3 A first-order reweighting via the Cauchy-Crofton 
formula 

To simplify the statements of the theorems, we introduce the following definition: 
Definition 2. (Unbiased weighting) 

We say that the random variable W is an unbiased weighting of a probability 
measure P if P(H) = E[IU x 1^] for every P-measurable set A. 

For instance, the weighted mean and the weighted histogram converge to the 
same values as the unweighted mean and histogram of X as the number of sam¬ 
ples goes to infinity. The rate of convergence, however, may be very different for 
the weighted samples than the unweighted samples. For example, while the sam¬ 
ple means P ^ + Ni and P 1 + 100 x W, where W, Va, ... ~ ^(0,1) 
i.i.d., both converge almost surely to 1 as n —>■ oo, X]r=i 1 + 100 x Ni converges 
much slower because the terms have much larger variance. Our primary goal in 
this paper is to find weightings that greatly reduce the variance in the samples 
and hence greatly increase the rate of convergence of the estimators. We now 
state the main theorem of this section, which uses the Gauchy-Grofton formula 
to obtain a variance-reducing first-order unbiased weighting of the intersection 
(Figure [5|: 
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Theorem 1. Let Q be the uniform probability measure, with density fiQ, defined 
on a subset D C of finite volume. Let A : K" —>■ be the constraint 

function and a G the constraint value. Let S be a random search-subspace 
of dimension d > k distributed according to the kinematic measure. Then the 
intersection points x of S with the manifold A4 = A“^(a) can be weighted in an 
unbiased way with respect to fp{a), the probability density o/P = AoQ at a, as 


«.(x)dVoi,_,(»,) = X (4) 

where V denotes the Jacobian, and \M\ := \JM) denotes the product 
of the singular values of any matrix M. 

Proof. We first observe that it suffices to prove the theorem for the special 
case when Q is the uniform distribution on D. We can then integrate fqix) x 
Voi„_i7-DjxVoid(S) ^ IvaWi ^ extend the result to arbitrary Q. 

Let g ~ Q be a point uniformly distributed on D. Denoting by B^{a) the 
fc-ball of radius e centered at a S we have 


fp{a) = lim 

e 4 x 0 


P(A(g) € B,{a)) 
Volt {B,(a)) 


= lijn P(g e A-Hgd(a))) ^ Vol„-i(g € X-\B,ia)))/Vol„_,{D) 
eto Volk{B,:{a)) 40 Volfc(B<:(a)) 

If 1 

= Wli- ITT / I . dVol„-fc-i(g), (5) 

Vol„_i(D) Jx-i(a) |VA(g)| 

where the last equality is obtained from the change of variables formula. We now 
use the layer cake lemma from measure theory to layer the manifold A4 = X~^{a) 
with |vA\g)| C^Voln-fc-i layers My := M H { |vA\g)| < g}’ apply the Cauchy- 
Crofton formula |H] separately to each of these layers (as illustrated in Figure 

§: 


1 


1 


Vol„_i(D) A-i(a) |VA(g)| 
1 


dVol„_fe_i(g) 


Vol„_ 

1 


—— / Voln-k-i{My)dy 
l(B) Jy>0 


Vol„_ 

1 




X 


y>0 

^d,k,n,K 


Voln-i{D) VoUiS) 

1 Cd,k,n,K 

Vol„_i(D) "" voi;^ 


X Es 
X Es 


'v>0 


VoU-k{S n My)dy 


'SnM l^^(5)l 


dVold_fe 


where the expectation Es is taken with respect to Kinematic measure on S, 
and Cd,k,n,K is the constant from the Cauchy-Crofton formula. The exchange of 
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the integral and the expectation holds by the Fubini-Tonelli theorem, since the 
integrand is nonegative. Hence, w{x) = voirf^s) ^ dVold-fc(cc) is an 

unbiased reweighting with respect to /p(a). □ 



Figure 5: The random great circle (red) intersects the constraint manifold (the 
blue ribbon which represents the level set {g : Ai = 3} in this example) at 
different points, generating samples (green dots). The constraint manifold has 
different (differential) thickness at different points, given by Theorem 

1 says that instead of weighting the green dots by the (differential) intersection 
length of the great circle and the constraint manifold at the green dot, we 
can instead weight it by the local differential thickness, greatly reducing the 
variation in the weights (see Sections 3.2 ^and[^. Decomposing this thickness 
into layers of manifolds, each of uniform thickness, by means of the Layer Cake 
Lemma from measure theory, allows us to apply the Cauchy-Crofton formula 
individually to each manifold in the proof of Theorem . 


3.1 The first-order reweighted algorithm 

As discussed in the introduction, we can apply the first-order reweighting of The¬ 
orem 1 to the Metropolis-within-Gibbs algorithm with d-dimensional isotropic 
random search-subspaces to get a more efficient MCMC algorithm (Algorithm 

0 : 
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Algorithm 1 Integral Geometry reweighted Metropolis-within-Gibbs MGMG 

1. Input: Oracle for Probability density / : K" — )■ [0,1], Oracle for Gon- 

straint function A : K" —)■ n > fc > 0, (we condition on A4 = {A(a:) = 

c}) 

2. Input: An oracle for the Jacobian VA 

3. Input: Oracle for observed statistic ijj : K" —>■ K® 

4. Input: Search-subspace dimension d > k, Starting point xq, probabil¬ 
ity density of p(r) of for the step distance r, number of Gibbs sampling 
iterations imax 

5. For i = 1 to Zmax 


(a) Generate a random isotropic d-dimensional linear search-subspace 
centered at Xi (this can be easily done using spherical Gaussians 
and the QR |3S| decomposition) 


(b) 


Use an MGMG method (usually heavily based on a nonlinear solver, 
as in m) to sample a point from the (unnormalized) probability 
density 


w{x) 


fix) X p{\\x-Xi\\) 

|VA|5.(x)| 


dVold-fc 


(7) 


supported on Si+i n At, where VAj^^ is the gradient of the restriction 
of A to the sphere of radius jjx — x^jj centered at x^. (If A4 is full¬ 
dimensional then i„,A , ,1 is set to 1) 

(Note: This is the "Metropolis" step in the traditional Metropolis- 
within-Gibbs algorithm, but reweighted according to Theorem re¬ 
stricted to the sphere Sx) 

(c) compute ifixi) 


6. Output: Unweighted samples asymptotically distributed as 

i —>■ oo according to the conditional density /|{A(x) = c}, and 

ipixi), ^/’(x 2 ),..., ■*/'(2^imax) (from which we can compute statistics of ip, such 
as the mean, variance, or the histogram of pj) 


In many cases, we can take the probability density / to be spherical Gaussian 
(for instance, A and ip can be functions of a random matrix whose entries are 
functions of iid Gaussians x = (xi,..., x„)). In this situation, we only need 
to perform one search-subspace iteration to obtain a sample from the correct 
distribution (Algorithm : 
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Algorithm 2 Integral Geometry reweighted independent search-subspace 
Metropolis-within-Gibbs MGMG for sampling from functions of Gaussians 
Goal: We wish to condition on = {A(a;) = c}, where the probability distri¬ 
bution on M" is f{x) = ^ the density of iid standard normals. 

1. Input: Oracle for Gonstraint function A : M" n> k > 0 

2. Input: Oracle for the Jacobian VA 

3. Input: Oracle for observed statistic ip : K" —>■ M® 

4. Input: Search-subspace dimension d > k. Number of iterations imax- 

5. for z = 1 to Z„iax 

(a) Generate a random isotropic d-dimensional linear search-subspace Si 
centered at the origin. 

(b) Use an MGMG method (usually heavily based on a nonlinear solver, 

as in [E]) to sample a point Xi from the (unnormalized) probability 
density w{x) = supported on Px„ is the den¬ 

sity of the Xn distribution and VAj^^ is the gradient of the restriction 
of A to the sphere Sx of radius r = ||a;|| centered at the origin. (If A4 
is full-dimensional then is set to 1.) 

6. Output: Unweighted samples {xiYpppi independent and cor¬ 

rectly distributed even for finite i according to the conditional density 
/|{A(a;) = c}, from which we can obtain ip{xi),ip{x2), ■■■,ipixi^^Y (and 
compute statistics of ip, such as the mean, variance, or histogram of ip) 


3.2 Traditional weights vs. integral geometry weights 

In this section we find the theoretical distribution of the traditional weights 
and compare them to the integral geometry weights of Theorem We will see 
that while the traditional weights have an infinite variance, greatly slowing the 
MGMG algorithm, the integral geometry weights vary only with the differential 
thickness of the level set M = {x : A(a:) = a}. 

In the codimension-fc = 1 case, we can find the distribution of the weights 
by observing that the symmetry of the Haar measure means that the distri¬ 
bution of the weights are a local property that does not depend on the choice 
of manifold Ai. Moreover, since the Kinematic measure is locally the same 
for all three constant curvature spaces S", K", and H", the distribution is the 
same regardless of the choice of constant curvature space. Hence, without loss 
of generality, we may choose M. to be the unit circle in K^. Because of the 
rotational symmetry of both the kinematic measure and the circle, without loss 
of generality we may condition on only the vertical lines {{x,t) : t G K}, in 
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which case x is distributed uniformly on [—1,1]. The weights are then given 
hy w = w{x) = 1 + with exactly two intersections at almost every 

X. Hence, E[?ii] = 2 ^^1 + jz^dx = 27r, the circumference of the circle, 

as expected. However, E[ry^] = 2 1 + yz^dx = oo. Hence, the weights w 

have infinite variance, greatly slowing the convergence of the sampling algorithm 
even in the codimension-/c = 1 case! On the other hand, the integral geometry 
weights, being identically = 1 have variance zero, so the weights do not slow 
down the convergence at all. (A related computation, which we do not give 
here, shows that the theoretical weights for general k are given by the Wishart 
matrix determinant I det(G^G) I’ ’'^here G is a (A: + 1) x A: matrix of iid standard 
normals, which also has infinite variance.) 

In practice, nonlinear solvers do not find the different intersection points 
uniformly at random, so different points can have a different distribution of 
weights, introducing an inaccuracy in the estimator that uses our samples. As 
we saw in Figure]^ the inaccuracy (as well as the variance) is much greater when 
using the traditional weights than when using the integral geometry weights. 
This inaccuracy should ideally be corrected by randomizing the solver by turning 
it into a Markov chain. The greater the randomization needed, the more the 
solver behaves like a random walk and less like a solver, slowing the convergence 
|26| . Since the samples paired with the traditional weights have much greater 
inaccuracies that need to be corrected, a Markov chain using the traditional 
weights will require greater randomization of the nonlinear solver (in addition 
to having a much greater variance in the weights), and hence should converge 
much more slowly than a Markov chain using the traditional weights. 


4 A second-order reweighting via the Chern-Gauss- 
Bonnet theorem 

Oftentimes, it is necessary to use a random great sphere of dimension d larger 
than the codimension k of the constraint manifold. For instance, the manifold 
might represent a rare event, so we might use a higher dimension than the codi¬ 
mension to increase the probability of finding an intersection with the manifold. 
However, the intersections will no longer be points but submanifolds of dimen¬ 
sion d — k. How should one assign weights to the points on this submanifold? 
The first-order factor in this weight is simple: it is the same as the Jacobian 
weight of Equation However, the size of the intersection still depends on 
the orientation of the great sphere with respect to the constraint manifold. For 
instance, we will see in Section |4.7| that if we intersect a sphere with a plane 
near its center, then we will get a much larger intersection than if we intersect 
the sphere with a plane far from its center. 

This example suggests that we should weight the points on the intersection 
using the local curvature form, which is described by the second derivatives 
of the function whose level set is the constraint manifold: If we intersect in a 
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Figure 6: Both d-dimensional slices, Si and S 2 , pass through the green point x, 
but the slice passing through the center of the n-1 sphere Ai has a much bigger 
intersection volume than the slice passing far from the center. The smaller slice 
also has larger curvature at any given point x. If we reweight the density of 
SiD A4 at a; by the Chern-Gauss-Bonnet curvature of S'i n at x, then both 
slices will have exactly the same total reweighted volume (exact in this case 
since the sphere has constant curvature form), since the Chern-Gauss-Bonnet 
theorem relates this curvature to the volume measure. 


direction where the second derivative is greater (with the plane not passing near 
the center in the example) then we should use a larger weight than in directions 
where the second derivative is smaller (when the plane passes near the center) 
(Figure [e]). 

Consider the simple case where is a collection of spheres. If we were just 
applying an algorithm based on Theoremj^ such as Algorithmj^ we would sam¬ 
ple uniformly from the volume on the intersection S' H (Step 6 in Algorithm 
[^. However, the intersected volume depends heavily on the orientation of the 
search-subspace S with respect to each intersected sphere (Figure]^, meaning 
that the algorithm will in practice have to search through exponentially many 
spheres before converging to the uniform distribution on S H AI (See section 
4.71. To avoid this problem, we would like to sample from a density w that 


is proportional to the absolute value of the Chern-Gauss-Bonnet curvature of 
SnA4 at each point x in the intersection: w = w{x; S) = |Pf(H 2 ;(Sn AI))| (The 
motivation for using the Chern-Gauss-Bonnet curvature Pf(na;(S'C AI)) will be 


discussed in Section 4.41. 


However, sampling from the density zi(a;; S') does not in general produce 
unbiased samples uniformly distributed on M even when S is chosen at random 
according to the kinematic measure. We will see in Theoremthat in order to 
guarantee an unbiased uniform sampling of Ai we can instead sample from the 
normalized curvature density 


w(x', S) = 


w{x; S) 


yous) ^ ^ 

Cd,k,n,K Eg [w(a;; Sq) x det(Proj;^4i(5)] ' 


( 8 ) 


The normalization term Eg [w{x; Sq) x det(Proj^i(5)] is the average curvature 
at X over all the random orientations at which S could have passed through 
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X. Here Sq = Q{S — x) + a; is a random isotropically distributed rotation 
of S about X, with Q the corresponding isotropic random orthogonal matrix. 
The determinant inside the expectation is there because while S is originally 
isotropically distributed, the conditioning of S to intersect M (at x) modifies 
the probability density of its orientation by a factor of det(Proj^iQ). Pm±Q 
is the projection of the orthogonal complement of the tangent space of Ai at 
X. In this collection of spheres example, the denominator is a constant for each 
sphere of a radius R. For instance, in the Euclidean case it can be computed 
analytically, using the Gauss-Bonnet theorem, as 


(27r; 


d-i r(i 

t-2- 


+ 1 ) 


7r2 (n — d) 


r(-^ + i)r(^ + i) 
{n-d)r{-^ + ^ + i) 


From this fact, together with the fact that the total curvature is always the 
same for any intersection by the Chern-Gauss-Bonnet theorem, we see that 
when sampling under the probability density w the probability that we will 
sample from any given sphere is always the same regardless of the volume of the 
intersection of S with that sphere. Since each sphere (of the same radius) has an 
equal probability of being sampled, when sampling from AA the algorithm has 
to search for far fewer spheres before converging to a uniformly random point 
on S' n than when sampling from the uniform distribution on S H . 

The need to guarantee that w will still allow us to sample uniformly without 
bias from AA motivates introducing the following theorem (Theorem]^, which, 
as far as we know, is new to the literature. Since the proof does not rely on the 
fact that w is derived from a curvature form, we state the theorem in a more 
general form that allows for arbitrary w (see Sections 4.5 and 4.6 for higher-order 
choices for w beyond just the Ghern-Gauss-Bonnet curvature). 


Theorem 2. (Generalized Cauchy-Crofton formula) 

Let AA be a codimension-k submanifold o/K" with eurvature uniformly bounded 
above, whereW^ G {S”, K”, H"}. Let S be a finite-volume (inS'^, M", or 

planar (in or hyperboloidal (in W^) random d-dimensional seareh-subspace 
with uniformly bounded curvature distributed according to the kinematic mea¬ 
sure. Then the intersection of S with AA can be weighted in an unbiased manner 
with respect to the volume-measure on AA as 


w{x; S) dVold_fe 


Vold(S) ^ w{x-,S) 

Cd,k,n,K Eq[w(x;Sq) X det(Proj;KiQ)] 


(9) 


where the pre-normalized weight w{x] S) is any function such that a < w{x; S) < 
b for some 0 < a < b, and is Lipschitz in the variable x G AA for some Lipschitz 
constant 0 < c ^ oo (when using a translation of S to keep x in S f] AA when 
we vary x). 

Q is a matrix formed by the first d columns of a random matrix sampled from 
the Haar measure on SO(n). Sq = Q{S — x) -\-x, where R± is a rotation matrix 
rotating S — x so that it is orthogonal to the tangent space of AA. Proj^^ is 
the projection onto the orthogonal complement of the tangent space of AA at x. 

(As in Lemma^ if S is a plane or hyperboloid, we set Vold(S) to 1.) 
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Corollary 2.1. 

Suppose that w{x] S) is c(t)-Lipschitz on n {a; : w[x] S) < t}, and that 

lim ^ V t) X det(Proj^iQ)] _ ^ 

t^oo EQ[i A w(x;S'q) V t X det(Proj^i(3)] 

and 


lim Es 

h—^oo 


lyi(a;) X [■u;(x; S') — - V w{x] S) A t] dVol 


U SnM 

where we define the "A" and "V" operators to be r A s := 
r M s := max{r, s}, respectively, for all r, s S K. 

Then Theorem [2] holds even for a = 0 and b = c = oo. 


= 0 , 

min{r, s} and 


Remark 2. While the Chern-Gauss-Bonnet curvature pre-weight technically 
does not satisfy the Lipschitz and boundedness conditions of Theorem we 
can introduce upper and lower cutoffs a and b to the curvature pre-weight w 
used in the algorithm to make it satisfy these conditions, using the pre-weight 
aVwAb instead. As we shall see in section [477| even in the case of positive-definite 
curvature, where arbitrarily large intersection curvatures can occur, the volume 
of the points with curvature larger than a certain cutoff accounts for only a tiny 
fraction of the average volume of a random intersection. Hence, introducing an 
upper cutoff b for the curvature reweighting should only have a tiny effect on 
the convergence rate, provided that b is large enough (if the curvature form of 
the manifold is uniformly bounded above, cutting off the curvature pre-weight 
below b will guarantee that it is Lipschitz as well). Likewise, if the volume of the 
points with curvature form below a certain cutoff is very small, then the lower 
cutoff a should also have a tiny effect on the convergence rate. For this same 
reason we expect that for most manifolds of interest the Chern-Gauss-Bonnet 
curvature pre-weight will satisfy the assumptions of Corollary 2.1, allowing us to 
use the curvature form without any cutoffs. Nevertheless, for the sake of com¬ 
pleteness, in the future we hope to further weaken the assumptions in Theorem 
[^beyond what was proved in Corollary 2.1. 

Proof. (Of Theorem]^ 

We first observe that it suffices to prove Theorem for the case where K" = 
K" is Euclidean, S' is a random plane, and w{x', S) = w{x] g^) depends only 
on the orientation = g^|^ of the tangent spaces of S and M. at x. This 
is because constant curvature kinematic measure spaces are locally Euclidean 
(and converge uniformly to a Euclidean geometry if we restrict ourselves to 
increasingly small neighborhoods of any point in the space because the curvature 
is the same). We may use any geodesic d-cube in place of the plane as a search- 
subspace S, since S can be decomposed as a collection of cubes, and Equation 
[^treats each subset of S in an identical way (since so far we have assumed that 
w{x\ S) depends only on the orientation of the tangent spaces of S and M. at 
x). We can then approximate any search-subspace S of bounded curvature, and 
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Lipschitz function w{x; S) that depends on the location on S where S intersects 
A4 (in addition to by approximating S with very small squares, each with 
a different "w{x; g^)" that depends only on 

The remainder of the proof consists of two parts. In Part I we prove the 
theorem for the special case of very small codimension-fc balls (in place of Ai). 
In Part II we extend this result to the entire manifold by tiling the manifold 
with randomly placed balls. 

Part I: Special case for small codimension-A: balls 

Let Se = B^{x) be any k-ball of radius e that is tangent to At C M" at 
the ball’s center x. Let S and S be independent random d-planes distributed 
according to the kinematic measure in M". Let r be the distance in the fc-plane 
containing (the shortest line contained in this plane) from S to the ball’s 
center x. Let 6 be the orthogonal matrix denoting the orientation of S. Then 
we may write S = Sr,e 

Then almost surely (i.e., with probability 1 ; abbreviated "a.s.") Vol(S'r,e H 
Bf) does not depend on 9 (this is because B^ is a codimension-fc ball and S is 
a d-plane, so the volume of S' n B^, itself ad— fc-ball, depends a.s. only and r 
and not on 9). We also note that w{x] obviously does not depend on r as 

well. Define the events E := {Sr, 6 i n Be 7 ^ 0} and E •= {S B^ ^ 0}. Then 


E, 


•r,0 


dS 


x,9 


dBe 


= E, 


r.e 


X Vo\d-k{Srfi n Be) 

dS., 


= Eg 


= Eg 


w x; 


dB, 

dBe 


W x; —^ X Vold_fc(Sr,e n Be) 


E 


X P(B) 


E 


X E4Vold_fc(S,,e n Be)|B] X P(B) 


( 10 ) 

( 11 ) 

( 12 ) 


1 w{x-, %^) 

X 


[Cd.fe.n.R Eg[u)(x; g^)|B] 


E 


X E4Vold_fc(S,,e n Be)|B] X P(B) 

(13) 


1 EAu>(x;^)|B] 

xE4Vold_fc(S,.enBe)|B] xP(B) (14) 


Q.fc.n.R Eg[i&(x; ^)\E] 


1 


Cd,fc,n,E 

1 


X 1 X E,[Vold_fc(S,,e n Be)|B] X P(B) 
X E,,e[Vold_fc(S,,e n B,)\E] x P(B) 


(15) 

(16) 
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X Er,6l[Vold-fc(5'r,6» n Re)] 

(17) 

Cd,fe,n,E 


X Cd,k,n,R X Vold_fe(Re) 

(18) 

^d,fc,n,E 

Vold_fe(Re). 

(19) 


• Equation is due to the fact that r and 9 are independent random 
variables even when conditioning on the event E. This is true because 
they are independent in the unconditioned kinematic measure on S', and 
remain independent once we condition on S intersecting (i.e., the event 
E) because of the symmetry of the codimension-fc ball B^. 

• Equationis due to the fact that, by the change of variables formula, 

[ VoI{Tq + Rg^yn B,)dVoln-d{y) X „ = Vol(S,) (20) 

jTg^n-d det r roj Lj 

for every orthogonal matrix Q, where the coordinates of the integral are 
conveniently chosen with the origin at the center of B^. Rq± is rotation 
matrix rotating the vector y so that it is orthogonal to Tq, the subspace 
spanned by the rows of Q. 

Multiplying by w{x] Q) and rearranging terms gives 


w{x]Q) X det(Proj^i(5) = 

Vo1(Tq + RQ±y n B^)dVoln-d(y) 


w(x; Q) X 


Vol(S,) 


( 21 ) 


Taking the expectation with respect to Q (where Q is the first d columns 
of a Haar(SO(n)) random matrix) on both sides of the equation gives 


E(3[w(x;Q) X det(ProjByQ)] 

Vol(rQ + RQ±yn Be)dVol„d(y) 


= Er 


w{x; Q) X 


Vol(B,) 


( 22 ) 


Recognizing the right hand side as an expectation with respect to the 
kinematic measure on Tq + Rq± y conditioned to intersect Re (since the 
fraction on the RHS is exactly the density of the probability of intersection 
for a given orientation of Q), we have: 


EQ[w(a;;(5) x det(ProjBiQ)] =E^ 




w x: —— 

E 

\ dM) 



(23) 
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• Equation 


15 


is due to the fact that 


dSx 


dS, 


tangent space, and hence 

d5,,e 


dS, 


dB, 


because has a constant 


Efl 


dB, 


E 

= Er fl 


= E, 


'r,6 


dS”. 


x,9 


dB, 


dSr 


dB, 


E 


E 

= E 


dS” 

dK 


E 


■ (24) 


• Equation is by the Cauchy-Crofton formula. 

Writing Eg in place of E^^e in Equation [Io|(LHS)/[I^(RHS) (we may do this 
since S = Sr ,9 is determined by r and 6), and observing that 


dBe 


we have shown that 


Es 


W X] 


d^ \ 

- 


dM y 


X Vold_fc(,SnB,) 


= Vol,,_fe(i?,). 


(25) 


Part II: Extension to all of A4 

All that remains to be done is to extend this result over all of At. To do so, 
we consider the Poisson point process {a;|} on A4, with density equal to voEirj- 
We wish to approximate the volume-measure on A4 using the collection balls 
{B^{xl)} (think of making a papier-mache mold of A4 using the balls B^{x\) as 
tiny bits of paper). 

Let A C At be any measurable subset of At. Since At and S have uniformly 
bounded curvature forms, because of the symmetry of the balls and the sym¬ 
metry of the poisson distribution, the total volume of the balls intersected by S 
and A converges a.s. to Vo^S* n At n A) on any compact sumbanifold At C At: 


^ Wo\{SnBM))x 

{i-xfeM} 


Yol{B,{xl) n A) 
Voimxt)) 



Voi(5 n Ad n A) 


(26) 


and similarly, 

^ol{B,{xf) n A) ^ Vol(Ad n A). (27) 


But, by assumption, w is Lipschitz in a; on Ad (since w, which appears in 
both the numerator and denominator of w, is Lipschitz, and the denominator 
is bounded below by a > 0), so we can cut up Ad into a countable union of 
disjoint compact submanifolds U^^^Adj such that \w{t] g^-) — w{x; gy^)| < S 
on all of X, t S Adj, and hence, by Equation [26) 


lim 

40 


^ Voi(5'nR,(x,")nA) X 

{i-.x‘eMj} 


Vol(B,(x|)nA) 

Vol(B,(x,^)) 


X w 



dS 

dA^ 


/ 

J SnMjHA 




dVol(x) 


<6x Vol{S n Mj n A) 


(28) 
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a.s. for every j. 

Summing over all j in equation 1^ implies that 


lim 

40 




I SnMnA 


dS 


. dS” \ 


dM 


dVol(x) 


< (5 X Vol(S' n 7W n A) (29) 


almost surely. Since Equation is true for every <5 > 0, we must have that 

Wo\{B,{xl) n A) 




J SnMnA 


dS" \ 
''dMJ 

dS” \ 
AM j 


w\ X 


dVol(a;). (30) 


Hence, taking the expectation E 5 on both sides of Equation [30| we get 


Es 


^v„l(snB.(.‘)nA)xM|||n^ 


X re a; 


dS 

AM 


)dVol(a:) 


(31) 


.J SnMnA 

a.s. as e), 0 (we may exchange the limit and the expectation by the dominated 
convergence theorem, since | J2i Vol(S' n B^{xl) n H) x rc(a:f; 3 ^)] is dominated 
by 2 X Vol(S' n Al) x ^) for sufficiently small e. 

Since the sum on the LHS of Equation is of nonnegative terms we may 
exchange the sum and expectation, by the monotone convergence theorem: 


Ec 


E Tri/rx 7-> / €\ x\ Yoi(B^(Xj ) n x4) / UO 

_ v°i(5n g.lx.) n A) X X ^ 

= ^Es Vol(SnB,(i')) X xu'(i*; 


dS' 


Vol(H,(a:n) 


dS” \ 
' dM / 


(32) 


But by Equation|2^ Es[Vol(S' n B^{xl)) x w{xl; ^)] = Vol(i?e(a;|)), so 


Es 


J2voi{snB,{xt)) X 


dS \ 
AM / 


= ^Vol(H,(a:,^)) 




almost surely as e ), 0 by Equation ^7\ 
Combining Equations and gives 


Es 


I SnMnA 


w\ x: 


dS 

dM 


dVol(a:) 


= Vol(Mnx4). 


(34) 

□ 
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Proof. (Of Corollary 2 . 1 ) 

Define 

Eg [(u;(a;; Sq) - a A w{x; Sq) W b) x det(Proj_v|i( 5 )] 

'ih(t) '.= - - - 

Eg [a A w{x; Sq) V & x det(Proj_v|i( 5 )] 

Let A be any Lebesgue-measurable subset. Then 


Es 


1^(3;) X w(a:; 5 ')dVol 


'snM 


= lim E5 

UsnM 


X w(a:; 5 ')dVol 


= lim E5 

t—>-oo 


■ lim E5 


= lim E5 

UsnM 


= lim E5 

t—^OO 


= lim E5 


f X - V r(;(a;; S') A t dVol 

J SnM ^ 

f lA(a;) X [w(a;; S)-V w{x] S) A t] dVol 

JsnM ^ 

[ 1^(3;) X - V r(;(a;; S) A t dVol 

J SnM ^ 

1^(3;) X 


+ 0 

i V w{x\ S) At 


UsnM 


UsnM 


1a(3;) 


Eg[w(x;Sg) X det(Proj_v|iQ)] 


i V w{x; S) At 


dVol 


Eg [i V w{x] Sq) Atx det(Proj;^<i( 3 )] x (1 + V'(t)) 


dVol 


= lim E5 

t—¥(X) 


UsnM 


1a(3;) 


J A w{x; S)\/ t 


Eg [i A w{x] Sq) V t X det(Proj;^<i( 3 )] 
1 


dVol 


1 + tp{t) 


= lim Vol(A^ n A) X ,, , 
i^oo ^ ' i + i’it) 


= Vol{M n A) X 1 
= Vol(A^ n A), 


( 35 ) 

( 36 ) 

( 37 ) 

( 38 ) 

( 39 ) 


( 40 ) 


( 41 ) 

( 42 ) 

( 43 ) 

( 44 ) 
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Equation [38| is true because 


0 < Es 


< Es 


j X - A w(x] S)\/1 dVol 

JsnM ^ 

f - A w(a;; S') V t dVol 
JsnM ^ 


• Equation [i^follows from Theorem [fusing jAw(x; S)Vt as our pre-weight. 
Indeed, j A w{x\ S) V t obviously satisfies the boundedness conditions of 
Theorem Moreover, since w{x] S) is c(t)-Lipschitz everywhere on M. 
where 'w{x; S) < t, the pre-weight | A w{x] S) V t must be c(f)-Lipschitz 
on all X S AI. 


□ 


4.1 Second-order Chern-Gauss-Bonnet theorem reweighted 
algorithm 

Using the second order Chern-Gauss-Bonnet theorem reweighting of Theorem 
together with the first-order reweighting of Theorem (which we already 
implemented in Algorithm gives the following improvement to Algorithm 


Algorithm 3 Curvature-reweighted Metropolis-within-Gibbs MCMC 
All steps except steps 2 and 5(b) are the same as in Algorithm]^ 

2. Input: An oracle for the Jacobian VA and Levi-Civita connection curvature 
form U of the level set A4 = {x : A(x) = c} (possibly given as the set of 
second partial derivatives) 

5. (b) Use an MCMC method (usually heavily based on a nonlinear solver, as in 
1151 1 to sample a point x^+i from the (unnormalized) probability density 


w{x) 


f{x) X p(||x-Xj||) 

|VA|5.(x)| 

_|Pf(u.(^.+inMn5,,))|_ 

EQ[|Pf(U,(5Q n At n 5,)) I X det(Proj^^^g)] 


(45) 


supported on 5'i+i H At, where Sx is the sphere of radius ||x — Xi|| cen¬ 
tered at Xi- (If M. is full-dimensional then is set to 1) (Note: 

This is the "Metropolis" step in the traditional Metropolis-within-Gibbs 
algorithm, but re weighted according to Theorems and restricted to 
the sphere Sx) 


Remark 3. The curvature form r22,(5'i+i H AI H iSa,) of the intersected manifold 
can be computed in terms of the curvature form VLx{M.) of the original manifold 
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by applying the implicit function theorem twice in a row. Also, if is a 
hypersurface then |Pf(na;(S'i+i n n 5a;)) | is the determinant of the product of 
a random Haar-measure orthogonal matrix with known deterministic matrices, 
and hence EQ[|Pf(fla;((5n n 5a;))| x det(Proj^iQ)] is also the expectation of 
a determinant of a random matrix of this type. If the Hessian is positive-definite, 
then we can obtain an analytical solution in terms of zonal polynomials. Even in 
the case when the curvature form is not a positive-definite matrix (it is a matrix 
with entries in the algebra of differential forms), the fact that the curvature 
form is the Pfafhan of a random curvature form (in particular, a determinant 
of a real-valued random matrix in the codimension-1 case) should make it very 
easy to compute numerically, perhaps by a Monte Carlo method. 

This fact also means that it should be easy to bound the expectation, which 
allows us to use Theorem to get bounds for the volumes of algebraic manifolds 
(Section 4.8 1 . 


Remark 4. While the Chern-Gauss-Bonnet theorem only holds for even-dimensional 
manifolds, we can always modify the dimension of the search subspace by 1 so 
that the dimension d — k of S H Ai is even. Since we are sampling from a rare 
event, we must in any case choose d >> 1, so it makes little difference compu¬ 
tationally if S has dimension d or d-l- 1. Alternatively, we can include a dummy 
variable to increase both the dimensions n and d by 1. 


4.2 Reweighting when sampling from full-dimensional dis¬ 
tribution (as opposed to lower-dimensional manifolds) 

In many cases one might wish to sample from a full-dimensional set of nonzero 
probability measure. One could still reweight in this situation to achieve faster 
convergence by decomposing the probability density into its level sets, and ap¬ 
plying the weights of Theorems and separately to each of the (infinitely 
many) level sets. We expect this reweighting to speed convergence in cases 
where the probability density is concentrated in certain regions, since when d 
is large, intersecting these regions with a random search-subspace S typically 
causes large variations in the integral of the probability density over the different 
regions intersected by S, unless we reweight using Theorems and 


4.3 An MCMC volume estimator based on the Chern- 
Gauss-Bonnet theorem 

In this section we briefly introduce a (as far as we know) new MCMC method of 
estimating the volume of a manifold that is based on the Chern-Gauss-Bonnet 
curvature. While this method is interesting in its own right, we choose to intro¬ 
duce it at this point since it will serve as a good introduction to our motivation 


(Section 4.41 for using the Chern-Gauss-Bonnet curvature as a pre-weight for 
Theorem |2j 

Suppose we somehow knew or had an estimate for the Euler characteristic 
x{Ai) 7 ^ 0 of a closed manifold Ai of even-dimension m. We could then use a 
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Markov chain Monte Carlo algorithm to estimate the average Gauss curvature 
form E 7 VI [(Pf(fi))] on M. 

The Chern-Gauss-Bonnet theorem says that 

[ Pf(f])dVoU = (2^)?x(-M)- (46) 

Jm 


We may rewerite this as 


By definition, the left 


/^Pf(f2)dVoU {2n)^xiM) 

Im dVoU Im dVoU ■ 

hand side is E 7 vt[(Pf(f^))], and dVol„ 
(27r)T;^(7W) 


E^[(Pf(r!))] = 


VoU(Af) 


(47) 

Yolm{M), so 

(48) 


from which we may derive an equation for the volume in terms of the known 
quantities E^[(Pf(f2))] and x(A^) 


Yol^iM) 


(27r)?x(Af) 

E^[(Pf(f2))]- 


(49) 


4.4 Motivation for reweighting with respect to Chern- 
Gauss-Bonnet curvature 


While Theorem tells us that any pre-weight w generates an unbiased weight 
w, it does not tell us what pre-weights reduce the variance of the intersection 
volumes. We argue here that the Chern-Gauss-Bonnet theorem in many cases 
provides us with an ideal pre-weight if one only has access to the local second- 
order information at a point x. 

Equation of Section |4.3| gives an estimate for the volume 


Void_fe(5nM) 


{2TT)^x{SnM) 

EsnAr[(Pf(f^(^nM)))]’ 


(50) 


where n(S' G M) is the curvature form of the submanifold S' G M. 

If we had access to all the quantities in Equation our pre-weight would 
then be - ( 0 ^,... = ^snA 4 [Tf(n(SnA^)))] ^ jjowever, as we shall see we can- 

void.fctsnTvq { 27 r)^x{SnM) 

not actually implement this pre-weight since some of these quantities represent 
higher-order information. To make use of this weight to the best of our abil¬ 
ity given only the second-order information, we must separate the higher-order 
components of the weight from the second-order components by dividing out 
the higher-order components. 

The Euler characteristic is essentially a higher-order property, so it is not 
reasonable in general to try to estimate the Euler characteristic xiSOAi) using 
the second derivatives of A4 at a; because the local second order information gives 


26 









US little if any information about x(S'nAf) (although it may in theory be possible 
to say a bit more about the Euler characteristic if one has some prior knowledge 
of the manifold). The best we can do at this point is to assume the Euler 
characteristic is a constant with respect to S, or more generally, statistically 
independent of S. 

All that remains to be done is to estimate E 5 n>|Pf(^^('S'H M)). We observe 
that 

E,„«Pf(n(SnM)) = E,„«|Pt(n(SnA<))| x 

But the ratio pf[n (sn^M)) | ^ higher-order property since all it does 

is describe how much the second-order Chern-Gauss-Bonnet curvature form 
changes globally over the manifold, so in general we can say nothing about 
it using only the local second-order information. The best we can do at this 
point is to assume that this ratio is statistically independent of S as well. 

Hence, we have: 


1 

VoW^TEM) 


EsnMlPmiS n M j)\x 


{i27r)^xiSnM) 


EsnMPmiSnM)) 

EsnAr|Pf(H(5nX))| 


), (52) 


where we lose nothing by dividing out the unknown quantity 

(27r)™x(Afsince we have no information about it and it is 

independent of S. 

We would therefore like to use EsnEi |Pf(^2(S' G Af))| as a pre-weight. Since 
we only know the curvature form f2(S' H A4) locally at x, our best estimate 
for E5n>i|Pf(f^(«5' H Af))| is the absolute value |Pf(f22,(S' G M)\ of the Chern- 
Gauss-Bonnet curvature at x. Hence, our best local second-order choice for the 
pre-weight is w = |Pf(Ha;(S' G A^)|. 


4.5 Higher-order Chern-Gauss-Bonnet reweightings 

One may consider higher-order reweightings which attempt to guess not only the 
second-order local intersection volume, but also make a better guess for both the 
Euler characteristic of the intersection S'g G At, and how the curvature would 
vary over Sq G At. Nevertheless, higher-order approximations are probably 
harder to implement for the same reason that most nonlinear solvers, such as 
Newton’s method, do not use higher-order derivatives. 


4.6 Possible reweightings using Atiyah-Singer index the¬ 
orem or other topological invariants 

One may also consider reweighting with respect to topological invariants of Rie- 
mannian manifolds other than the Chern-Gauss-Bonnet curvature. For instance. 
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it may be possible to reweight with respect to the integrand of the Atiyah-Singer 
index theorem |5], which is the product of the Chern-Gauss-Bonnet curvature 
form and another differential form associated with an elliptical partial differ¬ 
ential equation (PDE) defined on A4. The Atiyah-Singer index theorem says 
that the integral of the product of these two differential forms over Ai is equal 
to the product of x(Af) and another term that is invariant under continues 
transformations of the PDE. The idea would be to use a carefully chosen PDE, 
whose associated differential form attempts to "counterbalance" the curvature 
form: when the manifold’s curvature is big, the PDE’s differential form is small, 
and vice versa. However, it remains to be shown whether such elliptical PDEs 
are easy to obtain for an implicitly defined manifold Af. 


4.7 Collection-of-spheres example and concentration of mea¬ 
sure 

In this section we argue that the traditional algorithm suffers from an exponen¬ 
tial slowdown (exponential in the search-subspace dimension) unless we reweight 
the intersection volumes using Corollary 2.1 with the Chern-Gauss-Bonnet cur¬ 
vature weights. We do so by applying two concentration of measure results, 
which we derive in |22| . to an example involving a collection of hyperspheres. 

Consider a collection of very many hyperspheres in K". We wish to sample 
uniformly from these hyperspheres. To do so, we imagine running a Markov 
chain with isotropically random search-subspaces. We imagine that there are 
so many hyperspheres that a random search-subspace typically intersects expo¬ 
nentially many hyperspheres. As a first step we would use Theorem which 
allows us to sample the intersected hypersphere from the uniform distribution 
on their intersection volumes. While using Theorem should speed conver¬ 
gence somewhat (as discussed in Section 3.21, concentration of measure causes 


the intersections with the different hyperspheres to have very different volumes 
(Figure]^. In fact we shall see that the variance of these volumes increases expo¬ 
nentially in d, causing an exponential slowdown if only Theorem is used, since 
the Metropolis subroutine would need to find exponentially many subspheres 
before converging. 

Reweighting the intersection volumes using Theorem causes each ran¬ 
dom intersection S H j\4i (where Ali is a subsphere) to have exactly the same 
reweighted intersection volume, regardless of the location where S intersects 
A4i, and regardless of d. Hence, in this example, Theorem allows us to avoid 
the exponential slowdown in the convergence speed that would otherwise arise 
from the variance in the intersection volumes. 

The first result deals with the variance of the intersection volumes of a 
sphere in Euclidean space. It says that the variance of the intersection volume, 
normalized by it’s mean, increases exponentially with the dimension d (as long 
as d is not too close to n). Although isotropically random search-subspaces are 
(conditional on the radial direction) distributed according to the Haar measure 
in spherical space, the Euclidean case is still of interest to us since it represents 
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Figure 7: The random search-subspace S intersects a collection of spheres Ai. 
Even though the spheres in this example all have the same n — 1-volume, the 
d — 1-volume of the intersection of S with each individual sphere (green circles) 
varies greatly depending on where S intersects the sphere if d is large. In 
fact, the variance of the intersection volume of each intersected sphere increases 
exponentially with d. This "curse of dimensionality" for the intersection volume 
variance leads to an exponential slowdown if we wish to sample from 5'n A1 with 
a Markov chain sampler (and 5" n A1 consists of exponentially many intersected 
spheres). However, if we use the Chern-Gauss-Bonnet curvature to reweight 
the intersection volumes, then all spheres in this example will have exactly the 
same reweighed intersection volume, greatly increasing the convergence of the 
Markov chain sampler. 


the limiting case when the hyperspheres are small, since spherical space is locally 
Euclidean. 


Theorem 3. (Concentration of Euclidean Kinematic Measure) 

Let S C ffi" be a random d-dimensional linear affine subspace distributed ac¬ 
cording to the Kinematic measure on K". Let A1 = S" C K" be the unit sphere 
in M". Defining a := we have 


- 1 < < A'(o.d)e-«0) - 1, (53) 


where 


^{a) = log(2) + (-)log(-) -{^ + i)log(- + 1) - ((^ - blog(- - 1) 
oi (y. Zi (y Z(y z (y 

k{a,d) = (^^^)(n-d)^^ 


' (n-l)(g-l) 
(d — l)(n -I- c? — 2) 


- 1 -- 


X e 


K{a,d) = (^)(n- d) 




(d — 1)(?T. -f d — 2) 


The next result (Figure]^ deals with the spherical geometry case. As in the 
Euclidean case, the spherical concentration result says that the variance of the 
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intersection volume increases exponentially with the dimension d as well. (While 
we were able to derive the analytical expression for the probability distribution 
of the intersection volumes, which we used to generate the plot in Figure 
showing an exponential increase in variance, we have not yet finished deriving a 
formal inequality analogous to Theorem for the spherical geometry case. We 
hope to make the analogous result available soon in |22| 1 



Figure 8: This log-scale plot (from [^) shows the variance of Vo^S” n A4) nor¬ 
malized by its mean, when 5" is a Haar-measure distributed d-dimensional great 
subsphere of S”, for different values of d, where n = 400. Ai is taken to be the 
boundary of a spherical cap of S" with geodesic radius such that S has a 10% 
probability of intersecting A4. The variance increases exponentially with the 
dimension d of the search-subspace (as long as d is not too close to n), leading 
to an exponential slow-down in the convergence for the traditional Metropolis- 
within-Gibbs algorithm applied to the collection-of-spheres example of Section 
|4.7| Reweighting the intersection volumes with the Chern-Gauss-Bonnet curva¬ 
ture using Gorollary 2.1 in this situation (where A1 is a subsphere) causes each 
random intersection S' H A4 to have exactly the same intersection volume re¬ 
gardless of d, allowing us to avoid the exponential slowdown in the convergence 
speed that would otherwise arise from the variance in the intersection volumes. 


4.8 Theoretical bounds derived using Theorem and al¬ 
gebraic geometry 

Generalizing on bounds for lower-dimensional algebraic manifolds based on the 
Gauchy-Grofton formula (such as the bounds for tubular neighborhoods in pi] 
and m): it is also possible to use Theorem to get a bound for the volume 
of an algebraic manifold A1 of given degree s, as long as one can also use 
analytical arguments to bound the second-order Ghern-Gauss-Bonnet curvature 
reweighting factor on Ai for some convenient search-subspace dimension d: 
Corollary 2.2. Let Ai C M" be an algebraic manifold of degree s and codimen¬ 
sion 1, such that EQ[|Pf(r2a;(5'QnA4)| xdet(Proj^j_Q)] > ^for every x € Ai, and 
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the conditions of Corollary 2.1 are satisfied if we set w{x] S) = |Pf(na;(S'H Al)|. 
Then 


Vol(M) < —— X i X (54) 

Proof. If we have an algebraic manifold of degree s in M”, by Bezout’s theorem 
the intersection with an arbitrary plane is also degree s. Hence (at least in 
the case where A4 has codimension 1), we can use Risler’s bound to bound 
the integral of the absolute value of the Gaussian curvature over 5” n by 

a := £lfi±zillvol(S") [301 HZ]. 

By Theorem]^ 


Vol(M) X 


|Pf(n,(5nA4)| 


Cd,k,n.K EQ[|Pf(fla;(S'(5 n Al)| X det(Proj^i(5)] 


dVold-fc] 


< 


X Es[|Pf(fl.(5nM)|] X ^ — 

^d.k.n.R ^ (^d.k.r. 


X a X 

b 


□ 

Unlike a bound derived using only the Cauchy-Crofton formula for point 
intersections, the bound in Corollary 2.2 allows us to incorporate additional 
information about the curvature, so we suspect that this bound will be much 
stronger in situations where the curvature does not vary too much in most 
directions over the manifold. We hope to investigate examples of such manifolds 
in the future where we suspect Corollary 2.2 will provide stronger bounds, but 
do not pursue these examples here because it is beyond the scope of this paper. 


Part II 

Numerical simulations 

5 Random matrix application: Sampling the stochas¬ 
tic airy operator 

Oftentimes, one would like to know the distribution of the largest eigenvalues 
of a random matrix in the large-n limit, for instance when performing principal 
component analysis m- For a large class of random matrices that includes 
the Gaussian orthogonal/unitary/symplectic ensembles, and more generally the 
beta-ensemble point processes, the joint distribution of the largest eigenvalues 
converges in the large-n limit, after rescaling, to the so-called hard-edge limiting 
distribution (The single-largest eigenvalue’s limiting distribution is the well- 
known Tracy-Widom distribution) |17l 1331 131 [T3]. One way to learn about these 
distributions is to generate samples from certain large matrix models. One 
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such matrix model that converges particularly fast to the large-n limit is the 
tridiagonal matrix discretization of the Stochastic Airy operator of Edelman 
and Sutton IMIE], 


d^ 2 

-X H- dW 

dx2 ’ 


(55) 


where dW is the white noise process. We wish to study the distributions of 
eigenvalues of the hard edge conditioned on other eigenvalue(s) or eigenvector 
statistics. 

To obtain samples from these conditional distributions, we can use Algorithm 
which is straightforward to apply in this case since dW is already discretized 
as iid Gaussians. 

The stochastic Airy operator (551 can be discretized as the tridiagonal matrix 

IS1I35] 


A /3 = A -hx diag(l, 2,..., k) + 


(56) 


where A = 


-2 1 
1 -2 1 

1 -2 1 


is the k X k discretized Laplacian, 


1 -2 1 
1 -2 

N ~ diag(A/'(0,1)") is a vector of independent standard normals, and the cutoff 
k is chosen (as in [51 |3S]) to be fc = 10n“3 (the O(10n“3) cutoff is due to 
the decay of the eigenvectors corresponding to the largest eigenvalues, which 
decay like the Airy function, causing only the first O(10n“3) entries to be 
computationally significant). 


5.1 Approximate sampling algorithm implementation 

Since the discretized stochastic Airy operator Ap is already explicitly a function 
of the iid Gaussians (Equation 561, we can use Algorithmj^to sample Ap 

conditional on our eigenvalue constraints of interest. To simplify the algorithm 
we can use a deterministic nonlinear solver with random starting points in place 
of the nonlinear solver-based MGMG “Metropolis” subroutine of Algorithm to 
get an approximate sampling (Algorithm 2.1, below). This is somewhat analo¬ 
gous to setting both the hot and cold baths in a simulated annealing-based (see, 
for instance, [32]) "Metropolis step" in a Metropolis-within-Gibbs algorithm to 
zero temperature, since we are setting the randomness of the Metropolis sub¬ 
routine to zero while fully retaining the randomness of the search-subspaces. 
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Algorithm 2.1 Integral Geometry reweighted one-iteration “Deterministic 
Nonlinear Solver-within-Gibbs” for sampling from Gaussians 
Steps 1-4 are the same as in Algorithmic 

5. for i = 1 to linax 

(a) Generate a random isotropic d-dimensional linear search-subspace Si cen¬ 
tered at the origin. Generate a sphere rS" centered at the origin with 
random radius r distributed according to the Xn distribution. 

(b) Use a deterministic nonlinear solver with random starting point to find 
a point Xi on the intersection S'i+i n Al n rS”. 

(c) compute ipixi) and the weight w{xi) = |va| s^*(a; )| • 


7. Output: Weighted samples {xiY^^^ with associated weights {w{xi)}l’XY that 

are independent and approximately distributed according to the conditional 

density /|{A(a;) = c}, where f{x) = ^ ^ is the density of iid 

V (2ir)" 

standard normals, from which we can obtain ^( 0 : 2 ): (and 

compute statistics of '0, such as the weighted sample mean ^ X) ^ 
ip{xi), the weighted sample variance, or the weighted histogram of Y) 


Remark 5. Using a deterministic solver with random starting point (Algorithm 
2.1) in place of the more random nonlinear solver-based Metropolis Markov 
chain subroutine of Algorithm [C introduces some bias in the samples, since the 
nonlinear solver probably will not find each point in the intersection S'i+i H Al H 
rS" with equal probability. There is nothing preventing us from using a more 
random Markov chain in place of the deterministic solver, which one would 
normally do. However, since we only wanted to compare weighting schemes, 
we can afford to use a more deterministic solver in order to simplify numerical 
implementation for the time being, as the implementation of the “Metropolis” 
step would be beyond the scope of this paper. It is important to note that this 
bias is not a failure of the reweighting scheme, but rather just a consequence 
of using a purely deterministic solver in place of the “Metropolis” step. On 
the contrary, we will see in Sections and that this bias is in fact much 
smaller than the bias present when the traditional weighting scheme is used 
together with the same deterministic solver. In the future, we plan to also 
perform numerical simulations with a random Metropolis step in place of the 
deterministic solver, as described in Algorithmic 

5.2 Comparison to rejection sampling 

In this section we briefly explain why rejection sampling is oftentimes too slow 
when sampling from the stochastic Airy operator, implying that there is a need 
for a faster algorithm like Algorithm 2.1. Rejection sampling is slow if we 
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condition on many eigenvalues at a time, or if we want to condition on a large 
deviation of even a single eigenvalue, since in both cases we are conditioning 
on rare events that occur with very low probability. In the large deviation 
case even the event that the largest eigenvalue is bigger than some value (as 
opposed to exactly equal to some value) is nearly a lower-dimensional manifold, 
since the eigenvalue distributions have Gaussian tails, which decay very quickly. 
For example, when we have Gaussian tails, for large t the event {Ai > t} is 
approximately equal in probability to an event {Ai € [t,t + e)} where e 0 as 
t —)■ oo. Hence, {Ai S [t,t + e)} converges to the lower-dimensional manifold 
{Ai = t} as t ^ oo. 

6 Conditioning on multiple eigenvalues 

In the first simulation (Figure]^, we sampled the fourth-largest eigenvalue con¬ 
ditioned on the remaining 1st- through 7th- largest eigenvalues. We begin with 
this example since in this particular situation, when conditioned only on the 
3rd and 5th eigenvalues, the 4th eigenvalue is not too strongly dependent on 
the other eigenvalues (the intuition for this reasoning comes from the fact that 
the eigenvalues behave as a system of repelling particles with only week repul¬ 
sion, so the majority of the interaction involves the immediate neighbors of A 4 ). 
Hence, in this situation, we are able to test the accuracy of the local solver 
approximation by comparison to brute force rejection sampling. Of course, in a 
more general situation where we do not have these relatively week conditional 
dependencies, rejection sampling would be prohibitively slow (e.g., even if we 
allow a 10 % probability interval for each of the six eigenvalues, conditioning on 
all six eigenvalues gives a box that would be rejection sampled with probability 
10 - 6 ). 

Despite the fact that the integral geometry algorithm is solving for 6 dif¬ 
ferent eigenvalues simultaneously, the conditional probability density histogram 
obtained using Algorithm 2.1 with the integral geometry weights (Figure 
blue) agrees closely with the conditional probability density histogram obtained 
using rejection sampling (Figure black). Weighting the exact same data 
points obtained with Algorithm 2.1 with the traditional weights instead yields 
a probability density histogram (Figure red) that is much more skewed to 
the right than either the black or blue curves. This is probably because, while 
theoretically unbiased, the traditional weights greatly amplify a small bias in 
the nonlinear solver’s selection of intersection points. 

7 Conditioning on a single-eigenvalue rare event 

In this set of simulations (Figure [To|, we sampled the second-largest eigenvalue 
conditioned on the largest eigenvalue being equal to -2, 0, 2, and 5. Since Ai = 5 
is a very rare event, we do not have any reasonable chance of finding a point 
in the intersection of the codimension 1 constraint manifold Ai = {Ai = 5} 
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Figure 9: In this simulation we used Algorithm 2.1 together with both the 
traditional weights (red) and the integral geometry weights (blue) to plot the 
histogram of A 4 |(Ai,A 2 , A 3 , A 5 , Ae, A 7 ) = (—2, —3.5, —4.65, —7.9, —9, —10.8) . We 
also provided a histogram obtained using rejection sampling of the approximated 
conditioning A 4 |(A 3 ^A 5 ) € [—4.65 ± 0.001] x [—7.9 ± 0.001] (black) for compar¬ 
ison (conditioning on all six eigenvalues would have caused rejection sampling 
to be much too slow). Since we used a deterministic solver in place of the 
Metropolis subroutine in Algorithm 2, some bias is expected for both re weight¬ 
ing schemes. Despite this, we see that the integral geometry histogram agrees 
closely with the approximated rejection sampling histogram, but the traditional 
weights lead to an extremely skewed histogram. This is probably because, while 
theoretically unbiased, the traditional weights greatly amplify a small bias in 
the nonlinear solver’s selection of intersection points. The skewness is especially 
large (in comparison to Figure [l0| because we are conditioning on 6 eigenvalues 
simult aneously. 


with the search-subspace unless we use a search-subspace of dimension d >> 1 . 
Indeed, the analytical solution for Ai tells us that P(Ai > 2) = 1 x 10“^, 
P(Ai > 4) = 5 X 10-®and P(Ai > 5) < 8 x 10-^° mUS]. For this same reason, 
rejection sampling for Ai = 2 is very slow (58 sec./sample vs. 0.25 sec./sample 
for Algorithm 2.1) and we cannot hope to perform rejection sampling for Ai = 5 
(It would have taken about 84 days to get a single sample!). To allow us to make 
a histogram in a reasonable amount of time, we will use Algorithm 2.1 with 
search-subspaces of dimension d = 23 >> 1, vastly increasing the probability of 
the random search subspace intersecting A4. 

In (Figure [l0| top), we see that while the rejection sampling (black) and 
integral geometry weight (blue) histograms of the density of A 2 IA 1 = 2 are fairly 
close to each other, the plot obtained with the exact same data as the blue 
plot but weighted in the traditional way (red) is much more skewed to the right 
and less smooth than both the black and blue curves, implying that using the 
integral geometry weights from Theorem greatly reduces bias and increases 


the convergence speed (Although the red curve is not as skewed as in Figure 10 
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of Section This is probably because in this situation the codimension of A4 
is 1, while in Section]^ the codimension was 6.) 

In (Figuremiddle), where we conditioned instead on Ai = 5, we see that 
solving from a random starting point but not restricting oneself to a random 
search-subspace (purple plot) causes huge errors in the histogram of A 2 IA 1 . We 
also see that, as in the case of Ai = 2, the plot of A 2 obtained with the traditional 
weights is much more skewed to the right and less smooth than the plot obtained 
using the integral geometry weights. 

In (Figure [l0| bottom), we use our Algorithm 2.1 to study the behavior of 
A 2 IA 1 for values of Ai at which it would be difficult to obtain accurate curves 
with traditional weights or rejection sampling. We see that as we move Ai to 
the right, the variance of A 2 IA 1 increases and the mean shifts to the right. One 
explanation for this is that the largest and third-largest eigenvalues normally 
repel the second-largest eigenvalue, squeezing it between the largest- and third- 
largest eigenvalues, which reduces the variance of A 2 IA 1 . Hence, moving the 
largest eigenvalue to the right effectively "decompresses" the probability density 
of the second-largest eigenvalue, increasing it’s variance. Moving the largest 
eigenvalue to the right also allows the second-largest eigenvalue’s mean to move 
to the right by reducing the repulsion from the right caused by the largest 
eigenvalue. 

Remark 6. As discussed in Remark of Section |5.1| if we wanted to get a 
perfectly accurate plot, we would still need to use a randomized solver, such as 
a Metropolis-Hastings solver, to randomize over the intersection points. Since 
d = 23, the volumes of the exponentially many connected submanifolds in the 
intersection would be concentrated in just a few of these submanifolds, 

with the concentration being exponential in d, causing the algorithm to be 
prohibitively slow for d = 23 unless we use Algorithm which uses the Chern- 
Gauss-Bonnet curvature reweighting of Theorem (see Section |4.7| ) . Hence, 
if we were to implement the randomized solver of Algorithm the red curve 
would converge extremely slowly unless we reweighted according to Theorem 
(in addition to Theorem . Hence, the situation for the traditional weights is 
in fact much worse in comparison to the integral geometry weights of Theorems 
and 1^ than even (Figure [lol middle) would suggest. 
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Figure 10: Histograms of A 2 IA 1 = —2,0,2, and 5, generated using Algorithm 
2.1. A search-subspace of dimension d = 23 was used, allowing us to sample 
the rare event Ai = 5. In the first plot (top) we see that the rejection sampling 
histogram of A 2 IA 1 = 2 is much closer to the histogram obtained using the inte¬ 
gral geometry weights (blue) than the histogram obtained with the traditional 
weights (red) because the red plot is much more skewed to the right and less 
smooth (it takes longer to converge) than either the blue or black plots. If we 
do not constrain the solver to a random search subspace, the histogram we get 
for A 2 IA 1 = 5 (purple) is very skewed to the right (middle plot), implying that 
using random search-subspace (as a opposed to just a random starting point) 
greatly helps in the mixing of our samples towards the correct distribution. In 
the last plot (bottom), the probability densities of A 2 IA 1 obtained with the in¬ 
tegral geometry weights show that moving the largest eigenvalue to the right 
has the effect of increasing the variance of the probability density of A 2 IA 1 and 
moving its mean to the right, probably because the second eigenvalue feels less 
repulsion from the largest eigenvalue as Ai —>■ 00. 


37 























References 


[1] Anders S. Christensen, Troels E. Linnet, Mikael Borg, Wouter Boomsma, 
Kresten Lindorff-Larsen, Thomas Hamelryck and Jan H. dense. Protein 
structure validation and refinement using amide proton chemical shifts de¬ 
rived from quantum mechanics. PLoS ONE, 8(12):1-10, 2013. 

[2] Michael F. Atiyah and Isadore Singer. The index of elliptic operators 
on compact manifolds. Bulletin of the American Mathematical Society, 
69(3):422-433, 1963. 

[3] Julian Besag. Markov chain Monte Carlo for statistical inference. Technical 
report. University of Washington, Department of Statistics, 04 2001. 

[4] Folkmar Bornemann. On the numerical evaluation of distributions in 
random matrix theory: A review. Markov Processes and Related Fields, 
16(4):803-866, 2010. 

[5] A. Doucet C. Andrieu, N. de Freitas and M.I. Jordan. An introduction to 
MCMC for machine learning. Machine Learning, 50:5-43, 2003. 

[6] Shiing-Shen Chern. On the curvatura integra in a Riemannian manifold. 
Annals of Mathematics, 46(4):674-684, 1945. 

[7] Neil J. Cornish and Edward K. Porter. MCMC exploration of supermassive 
black hole binary inspirals. Classical Quantum Gravity, 23(19):761-767, 
2006. 

[8] Morgan W. Crofton. On the theory of local probability, applied to straight 
lines drawn at random in a plane; the methods used being also extended 
to the proof of certain new theorems in the integral calculus. Philosophical 
Transactions of the Royal Society of London, 158:181-199, 1868. 

[9] Alan Edelman and Brian D. Sutton. From random matrices to stochastic 
operators. Journal of Statistical Physics, 127(6):1121-1165, 2007. 

[10] Israel M. Gelfand and Mikhail M. Smirnov. Lagrangians satisfying Crofton 
formulas. Radon transforms, and nonlocal differentials. Advances in Math¬ 
ematics, 109(2):188-227, 1994. 

[11] Charles J. Geyer. Markov chain Monte Carlo maximum likelihood. In 
Computing Science and Statistics: Proceedings of the 23rd Symposium on 
the Lnterface, pages 156-163, 1991. 

[12] Larry Guth. Degree reduction and graininess for Kakeya-type sets in 
preprint on arXiv: 1402.0518, 2014. 

[13] Sigurdur Helgason. Integral Geometry and Radon Transforms. Springer, 
New York, 2010. 


38 



[14] Jody Hey and Rasmus Neilsen. Integration within the Felsenstein equation 
for improved Markov chain Monte Carlo methods in population genetics. 
Proceedings of the national academy of sciences of the United States of 
America, 104(8):2785-2790, 2006. 

[15] J. Martin, L. Wilcox, C. Burstedde and O. Ghattas. A stochastic Newton 
MCMC method for large-scale statistical inverse problems with application 
to seismic inversion. SIAM Journal on Scientific Computing, 34(4):A1460- 
A1487, 2012. 

[16] E. Fernandes J.C. Alvarez Paiva. Gelfand transforms and Crofton formulas. 
Selecta Mathematica, 13(3):369-390, 2008. 

[17] I.M. Johnstone. On the distribution of the largest eigenvalue in principal 
components analysis. Annals of Statistics, 29:295-327, 2001. 

[18] Jose Ramirez, Brian Rider and Balint Virag. Beta ensembles, stochastic 
Airy spectrum, and a diffusion. Journal of the American Mathematical 
Society, 24(4):919-944, 2011. 

[19] Daphne Roller and Nir Friedman. Probabilistic Graphical Models. MIT 
Press, Cambridge, 2009. 

[20] Michel Ledoux. The concentration of measure phenomenon. In Peter 
Landweber, Michael Loss, Tudor Ratiu and J.T. Stafford, editor, Math¬ 
ematical Surveys and Monographs, volume 89. American Mathematical So¬ 
ciety, 2001. 

[21] Martin Lotz. On the volume of tubular neighborhoods of real algebraic 
varieties. arXiv:1210.3742, 2012. 

[22] Oren Mangoubi and Alan Edelman. Concentration of kinematic measure. 
In Preparation, 2015. 

[23] Vitali D. Milman. A new proof of A. Dvoretzky’s theorem on cross-sections 
of convex bodies. Funkcional. Anal, i Prilozhen, 5(4):28-37, 1971. 

[24] Ming-Hui Chen, Qi-Man Shao and Joseph G. Ibrahim. Monte Carlo meth¬ 
ods in Bayesian computation. Springer, 2000. 

[25] Boaz Nadler. On the distribution of the ratio of the largest eigenvalue to 
the trace of a Wishart matrix. Journal of Multivariate Analysis, 102, 2011. 

[26] Radford M. Neal. MCMC using Hamiltonian dynamics. In Steve Brooks, 
Andrew Gelman, Galin L. Jones and Xiao-Li Meng, editor. Handbook of 
Markov Chain Monte Carlo. CRC Press, 2011. 

[27] Stepan Yu. Orevkov. Sharpness of Risler’s upper bound for the total curva¬ 
ture of an affine real algebraic hypersurface. Russian Mathematical Surveys, 
62:393-394, 2007. 


39 



[28] Persi Diaconis, Susan Holmes and Mehrdad Shahshahani. Sampling from 
a manifold. In Advances in Modern Statistical Theory and Applications: 
A Festschrift in Honor of Morris L. Eaton, pages 102-125. Institute of 
Mathematical Statistics, 2013. 

[29] R.H. Baayen, D.J. Davidson and D.M. Bates. Mixed-effects modeling with 
crossed random effects for subjects and items. Journal of Memory and 
Language, 59:390-412, 2008. 

[30] Jean-Jaques Risler. On the curvature of the real milnor fiber. Bull. London 
Math. Soc., 35(4):445-454, 2003. 

[31] Luis A. Santalo. Integral geometry and geometric probability. In Gian- 
Carlo Rota, editor. Encyclopedia of Mathematics and its Applications, vol¬ 
ume 1. Addison-Wesley Publishing Company, 1976. 

[32] Rolf Schneider and Wolfgang Weil. Stochastic and Integral Geometry. 
Springer, 2008. 

[33] Michael Spivak. A comprehendsive introduction to differential geometry, 
Vol. III. Publish or Perish, Inc., Berkeley, 1999. 

[34] Michael Spivak. A comprehendsive introduction to differential geometry, 
Vol. V. Publish or Perish, Inc., Berkeley, 1999. 

[35] Brian D. Sutton. The stochastic operator approach to random matrix theory. 
PhD thesis, Massachusetts Institute of Technology, 2011. 

[36] Mihai Tibar and Dirk Siersma. Curvature and Gauss-Bonnet defect 
of global affine hypersurfaces. Bulletin des Sciences Mathematiques, 
130(2):110-122, 2006. 

[37] Tony Lelievre, Mathias Rousset, and Gabriel Stoltz. Free energy computa¬ 
tions: A Mathematical Perspective. Imperial College Press, 2010. 

[38] L. N. Trefethen and D. Bau III. Numerical Linear Algebra. SIAM, 1997. 

[39] Yongtao Guan and Stephen M. Krone. Small-world MCMC and conver¬ 
gence to multi-modal distributions: from slow mixing to fast mixing. The 
Annals of Applied Probability, 17(l):284-304, 2007. 

[40] Chenchang Zhu. The Gauss-Bonnet theorem and its applications. 
http://math.berkeley.edu/ alanw/240papers00/zhu.pdf, 2004. 


40 



